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Abstract 

Background: A frequent problem in computational modeling is the interconversion of chemical structures 
between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and 
de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing 
problem due to the multitude of different application areas for chemistry data, differences in the data stored by 
different formats (OD versus 3D, for example), and competition between software along with a lack of vendor- 
neutral formats. 

Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many 
languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a 
wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics 
algorithms, from partial charge assignment and aromaticity detection, to bond order perception and 
canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and 
outline a variety of uses both in terms of software products and scientific research, including applications far 
beyond simple format interconversion. 

Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it 
provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and 
substructure and similarity searching. For developers, it can be used as a programming library to handle chemical 
data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely 
available under an open-source license from http://openbabel.org. 



Introduction 

The history of chemical informatics has included a huge 
variety of textual and computer representations of mole- 
cular data. Such representations focus on specific atomic 
or molecular information and may not attempt to store 
all possible chemical data. For example, line notations 
like Daylight SMILES [1] do not offer coordinate infor- 
mation, while crystallographic or quantum mechanical 
formats frequently do not store chemical bonding data. 
Hydrogen atoms are frequently omitted from x-ray crys- 
tallography due to the difficulty in establishing coordi- 
nates, and are often ignored by some file formats as the 
"implicit valence" of heavy atoms that indicates their 
presence. Other types of representations require specifi- 
cation of atom types on the basis of a specific valence 
bond model, inclusion of computed partial charges, 
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indication of biomolecular residues, or multiple 
conformations. 

While attempts have been made to provide a standard 
format for storing chemical data, including most notably 
the development of Chemical Markup Language (CML) 
[2-6], an XML dialect, such formats have not yet 
achieved widespread use. Consequently, a frequent pro- 
blem in computational modeling is the interconversion 
of molecular structures between different formats, a pro- 
cess that involves extraction and interpretation of their 
chemical data and semantics. 

We outline for the first time, the development and use 
of the Open Babel project, a full-featured open chemical 
toolbox, designed to "speak" the many different repre- 
sentations of chemical data. It allows anyone to search, 
convert, analyze, or store data from molecular modeling, 
chemistry, solid-state materials, biochemistry, or related 
areas. It provides both ready-to-use programs as well as 
a complete, extensible programmer's toolkit for develop- 
ing cheminformatics software. It can handle reading, 
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writing, and interconverting over 110 chemical file for- 
mats, supports filtering and searching molecule files 
using Daylight SMARTS pattern matching [7] and other 
methods, and provides extensible fingerprinting and 
molecular mechanics frameworks. We will discuss the 
frameworks for file format interconversion, fingerprint- 
ing, fast molecular searching, bond perception and atom 
typing, canonical numbering of molecular structures and 
fragments, molecular mechanics force fields, and the 
extensible interfaces provided by the software library to 
enable further chemistry software development. 

Open Babel has its origin in a version of OELib 
released as open-source software by OpenEye Scientific 
under the GPL (GNU Public License). In 2001, OpenEye 
decided to rewrite OELib in-house as the proprietary 
OEChem library, so the existing code from OELib was 
spun out into the new Open Babel project. Since 2001, 
Open Babel has been developed and substantially 
extended as an international collaborative project using 
an open-source development model [8]. It has over 
160,000 downloads, over 400 citations [9], is used by 
over 40 software projects [10], and is freely available 
from the Open Babel website [11]. 

Features 

File Format Support 

With the release of Open Babel 2.3, Open Babel sup- 
ports 111 chemical file formats in total. It can read 82 
formats and write 85 formats. These encompass com- 
mon formats used in cheminformatics (SMILES, InChI, 
MOL, MOL2), input and output files from a variety of 
computational chemistry packages (GAMESS, Gaussian, 
MOPAC), crystallographic file formats (CIF, ShelX), 
reaction formats (MDL RXN), file formats used by 
molecular dynamics and docking packages (AutoDock, 
Amber), formats used by 2D drawing packages (Chem- 
Draw), 3D viewers (Chem3D, Molden) and chemical 
kinetics and thermodynamics (ChemKin, Thermo). For- 
mats are implemented as "plugins" in Open Babel, 
which makes it easy for users to contribute new file for- 
mats (see Extensible Interface below). Depending on the 
format, other data is extracted by Open Babel in addi- 
tion to the molecular structure; for example, vibrational 
frequencies are extracted from computational chemistry 
log files, unit cell information is extracted from CIF 
files, and property fields are read from SDF files. 

A number of "utility" file formats are also defined; 
these are not strictly speaking a way of storing the 
molecular structure, but rather present certain function- 
ality through the same interface as the regular file for- 
mats. For example, the report format is a write-only 
utility format [12] that presents a summary of the mole- 
cular structure of a molecule; the fingerprint format [13] 
and fastsearch format [14] are used for similarity and 



substructure searching (see below); the MolPrint2D and 
Multilevel Neighborhoods of Atoms formats calculate cir- 
cular fingerprints defined by Bender et al. [15,16] and 
Filimonov et al. [17,18] respectively. 

Each format can have multiple options to control 
either reading or writing a particular format. For exam- 
ple, the InChI format has 12 options including an 
option "K" to generate an InChlKey, "T <param>" to 
truncate the InChI depending on a supplied parameter 
and "w" to ignore certain InChI warnings. The available 
options are listed in the documentation, are shown in 
the Graphical User Interface (GUI) as checkboxes or 
textboxes, and can be listed at the command-line. In 
fact, all three are generated from the same source; a 
documentation string in the C++ code. 

Fingerprints and Fast Searching 

Databases are widely used to store chemical information 
especially in the pharmaceutical industry. A key require- 
ment of such a database is the ability to index chemical 
structures so that they can be quickly retrieved given a 
query substructure. Open Babel provides this functional- 
ity using a path-based fingerprint. This fingerprint, 
referred to as FP2 in Open Babel, identifies all linear 
and ring substructures in the molecule of lengths 1 to 7 
(excluding the 1-atom substructures C and N) and maps 
them onto a bit-string of length 1024 using a hash func- 
tion. If a query molecule is a substructure of a target 
molecule, then all of the bits set in the query molecule 
will also be set in the target molecule. The fingerprints 
for two molecules can also be used to calculate struc- 
tural similarity using the Tanimoto coefficient, the num- 
ber of bits in common divided by the union of the bits 
set. 

Clearly, repeated searching of the same set of mole- 
cules will involve repeated use of the same set of finger- 
prints. To avoid the need to recalculate the fingerprints 
for a particular multi-molecule file (such as an SDF file), 
Open Babel provides a fastindex format that solely 
stores a fingerprint along with an index into the original 
file. This index leads to a rapid increase in the speed of 
searching for matches to a query - datasets with several 
million molecules are easily searched interactively. In 
this way, a multi-molecule file may be used as a light- 
weight alternative to a chemical database system. 

Bond Perception and Atom Typing 

As mentioned above, many chemical file formats offer 
representations of molecular data solely as lists of 
atoms. For example, most quantum chemical software 
packages and most crystallographic file formats do not 
offer definitions of bonding. A similar situation occurs 
in the case of the Protein Data Bank (PDB) format; 
while standardized [19] files contain connectivity 
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information, non-standard files exist that often do not 
provide full connectivity information. Consequently, 
Open Babel features methods to determine bond con- 
nectivity, bond order perception, aromaticity determina- 
tion, and atom typing. 

Bond connectivity is determined by the frequently 
used algorithm of detecting atoms closer than the sum 
of their covalent radii, with a slight tolerance (0.45 A) to 
allow for longer than typical bonds. To handle disorder 
in crystallographic data (e.g., PDB or CIF files), atoms 
closer than 0.63 A are not bonded. A further filtering 
pass is made to ensure standard bond valency is main- 
tained; each element has a maximum number of bonds, 
if this is exceeded then the longest bonds to an atom 
are successively removed until the valence rule is 
fulfilled. 

After bond connectivity is determined, if needed or 
requested by the user, bond order perception is per- 
formed on the basis of bond angles and geometries. The 
method is similar to that proposed by Roger Sayle [20] 
and uses the average bond angle around an un-typed 
atom to determine sp and sp 2 hybridized centers. 5- 
membered and 6-membered rings are checked for pla- 
narity to estimate aromaticity. Finally, atoms marked as 
unsaturated are checked for an unsaturated neighbor to 
give a double or triple bond. After this initial atom typ- 
ing, known functional groups are matched, followed by 
aromatic rings, followed by remaining unsatisfied bonds 
based on a set of heuristics for short bonds, atomic elec- 
tronegativity, and ring membership. 

Atom typing is performed by "lazy evaluation," match- 
ing atoms against SMARTS patterns to determine hybri- 
dization, implicit valence, and external atom types. 
Atom type perception may be triggered by adding 
hydrogens (which requires determination of implicit and 
explicit valence), exporting to a file format that requires 
atom types, or as requested by the user. To minimize 
the amount of typing required, when importing from a 
format with atom types specified, a lookup table is used 
to translate between equivalent types. 

An important part of atom typing is aromaticity detec- 
tion and assignment of Kekule bond orders (kekuliza- 
tion). In Open Babel, a central aromaticity model is 
used, largely matching the commonly used Daylight 
SMILES representation [1], but with added support for 
aromatic phosphorous and selenium. Potential aromatic 
atoms and bonds are flagged on the basis of member- 
ship in a ring system possibly containing 4n+2 n elec- 
trons. Aromaticity is established only if a well-defined 
valence bond Kekule pattern can be determined. To do 
this, atoms are added to a ring system and checked 
against the 4n+2 tt electron configuration, gradually 
increasing the size to establish the largest possible con- 
nected aromatic ring system. Once this ring system is 



determined, an exhaustive search is performed to assign 
single and double bonds to satisfy all valences in a 
Kekule form. Since this process is exponential in com- 
plexity, the algorithm will terminate if more than 30 
levels of recursion or 15 seconds are exceeded (which 
may occur in the case of large fused ring systems such 
as carbon nano tubes). 

Canonical Representation of Molecules 

In general, for any particular molecular structure and 
file format, there are a large number of possible ways 
the structure could be stored; for example, there are N! 
ways of ordering the atoms in an MOL file. While each 
of the orderings encodes exactly the same information, 
it can be useful to define a canonical numbering of the 
atoms of a molecule and use this to derive a canonical 
representation of a molecule for a particular file format. 
For a zero-dimensional file format without coordinates, 
such as SMILES, the canonical representation could be 
used to index a database, remove duplicates or search 
for matches. 

Open Babel implements a sophisticated canonicaliza- 
tion algorithm that can handle molecules or molecular 
fragments. The atom symmetry classes are the initial 
graph invariants and encode topological and chemical 
properties. A cooperative labeling procedure is used to 
investigate the automorphic permutations to find the 
canonical code. Although the algorithm is similar to the 
original Morgan canonical code [21], various improve- 
ments are implemented to improve performance. Most 
notably, the algorithm implements heuristics from the 
popular nauty package [22,23]. Another aspect handled 
by the canonical code is stereochemistry as different 
labelings can lead to different parities. This is further 
complicated by the possibility of symmetry-equivalent 
stereocenters and stereocenters whose configuration is 
interdependent. The full details will be the subject of a 
separate publication. 

Coordinate Generation in 2D and 3D 

Open Babel, version 2.3, has support for 2D coordinate 
generation (Figure 1) through the donation of code by 
Sergei Trepalin, based on the code used in the MCDL 
chemical structure editor [24-26]. The MCDL algorithm 
aims to layout the molecular structure in 2D such that 
all bond lengths are equal and all bond angles are close 
to 120°. The layout algorithm includes a small database 
of around 150 templates to help layout cages and large 
fragment cycles. To deal with the problem of overlap- 
ping fragments, the algorithm includes an exhaustive 
search procedure that rotates around acyclic bonds by 
180°. 

Coordinate generation in 3D was introduced in Open 
Babel version 2.2, and improved in version 2.3, to enable 
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Figure 1 Interconversion of OD, 2D and 3D structures. The 

structures shown are of sertraline, a selective serotonin reuptake 
inhibitor (SSRI) used in the treatment of depression. A SMILES string 
for sertraline is shown at the top; this can be considered a OD 
structure (only connectivity and stereochemical information). From 
this, Open Babel can generate a 2D structure (bottom left, depicted 
by Open Babel) or a 3D structure (bottom right, depicted by 
Avogadro), and all of these can be interconverted. 



conversion from OD formats such as SMILES to 3D for- 
mats such as SDF (Figure 1). The 3D structure genera- 
tor builds linear components from scratch following 
geometrical rules based on the hybridization of the 
atoms. Single-conformer ring templates are used for 
ring systems. The template matching algorithm iterates 
through the templates from largest to smallest searching 
for matches. If a match is found, the algorithm con- 
tinues but will not match any ring atoms previously 
templated except in the case of a single overlap (the two 
ring systems of a spiro group) or an overlap involving 
exactly two adjacent atoms (two fused ring systems). 
After an initial structure is generated, the stereochemis- 
try (cis/trans and tetrahedral) is corrected to match the 
input structure. Finally, the energy of the structure is 
minimized using the MMFF94 forcefield [27-31] and a 
low energy conformer found using a weighted rotor 
search. 

While the 3D structure builder produces reasonable 
conformations for molecules without rings or with ring 
systems for which a template exists, the results may be 
poor for molecules with more complex ring systems or 
organometallic species. Future work will be performed 
to compare the results of Open Babel with other pro- 
grams with respect to both speed and the quality of the 
generated structures [32]. 

Stereochemistry 

A recent focus of Open Babel development has been to 
ensure robust translation of stereochemical information 
between file formats. This is particularly important 
when dealing with OD formats as these explicitly encode 
the perceived stereochemistry. Open Babel 2.3 includes 
classes to handle cis/trans double bond stereochemistry, 



tetrahedral stereochemistry and square-planar stereo- 
chemistry (this last is still under development), as well 
as perception routines for 2D and 3D geometries, and 
routines to query and alter the stereochemistry. 

The detection of stereogenic units starts with an ana- 
lysis of the graph symmetry of the molecule to identify 
the symmetry class of each atom. However, given that a 
complete symmetry analysis also needs to take stereo- 
chemistry into account, this means that the overall 
stereochemistry can only be found iteratively. At each 
iteration, the current atom symmetry classes are used to 
identify stereogenic units. For example, a tetrahedral 
center is identified as chiral if it has four neighbors with 
different symmetry classes (or three, in the case where a 
lone pair gives rise to the tetrahedral shape). 

Forcefields 

Molecular mechanics functions are provided for use 
with small molecules. Typical applications include 
energy evaluation or minimization, alone or as part of a 
larger workflow. The selection of implemented force 
fields allows most molecular structures to be used and 
parameters to be assigned automatically. The MMFF94 
(s) force field can be used for organic or drug-like mole- 
cules [27-31]. For molecules containing any element of 
the periodic table or complex geometry (i.e. not sup- 
ported by MMFF94), the UFF force field can be used 
instead [33]. Recently, code implementing the GAFF 
force field [34,35] was also contributed and released as 
part of version 2.3. All of the forcefields allow the appli- 
cation of constraints on particular atom positions, or 
particular distances. 

Several conformer searching methods have been 
implemented using the forcefields, all based on the "tor- 
sion-driving" approach. This approach involves setting 
torsion angles from a set of predefined allowed values 
for a particular rotatable bond. The most thorough 
search method implemented is a systematic search 
method, which iterates over all of the allowed torsion 
angles for each rotatable bond in the molecule and 
retains the conformer with the lowest energy. Since a 
systematic search may not be feasible for a molecule 
with multiple rotatable bonds, a number of stochastic 
search methods are also available: the random search 
method, which tries random settings for the torsion 
angles (from the predefined allowed values), and a 
weighted rotor search, a stochastic search method that 
converges on a low energy conformer by weighting par- 
ticular torsion angles based on the relative energy of the 
generated conformer. With Open Babel 2.3, conformer 
search based on a genetic algorithm is also available 
which allows the application of filters (e.g. a diversity fil- 
ter) and different scoring functions. This latter method 
can be used to generate a library of diverse conformers, 
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or like the other methods to seek a low energy confor- 
mer [36]. 

Implementation 

Technical Details 

Open Babel is implemented in standards-compliant C+ 
+. This ensures support for a wide variety of C++ com- 
pilers (MSVC, GCC, Intel Compiler, MinGW, Clang), 
operating systems (Windows, Mac OS X, Linux, BSD, 
Windows/ Cygwin) and platforms (32-bit, 64-bit). Since 
version 2.3, it is compiled using the CMake build system 
[37,38]. This is an open-source cross-platform build sys- 
tem with advanced features for dependency analysis. 
The build system has an associated unit test framework 
CTest, which allows nightly builds to be compiled and 
tested automatically with the results collated and dis- 
played on a centralized dashboard [39]. 

To simplify installation Open Babel has as few exter- 
nal dependencies as possible. Where such dependencies 
exist, they are optional. For example, if the XML devel- 
opment libraries are not available, Open Babel will still 
compile successfully but none of the XML formats 
(such as Chemical Markup Language, CML) will be 
available. Similarly, if the Eigen matrix and linear alge- 
bra library is not found, any classes that require fast 
matrix manipulation (such as OBAlign, which performs 
least squares alignment) will not be compiled. 

While the majority of the Open Babel library is writ- 
ten in C++, bindings have been developed for a range of 
other programming languages, including Java and the . 
NET platform, as well as the so-called "dynamic" script- 
ing languages Perl, Python, and Ruby. These are auto- 
matically generated from the C++ header files using the 
SWIG tool. As described previously [40], in the case of 
Python an additional module is provided named Pybel 
that simplifies access to the C++ bindings. These inter- 
faces facilitate development of web-enabled chemistry 
applications, as well as rapid development and 
prototyping. 

Code Architecture 

The Open Babel codebase has a modular design as 
shown in Figure 2. The goal of this design is threefold: 

1. To separate the chemistry, the conversion process 
and the user interfaces reducing, as far as possible, 
the dependency of one upon another. 

2. To put all of the code for each chemical format in 
one place (usually a single file) and make the addi- 
tion of new formats simple. 

3. To allow the format conversion of not just mole- 
cules, but also any other chemical objects, such as 
reactions. 



Command line 
interface 



GUI 
interface 



External 
program 



Conversion Control 

OBConversion 
OBFormat 

obconversion.h 



Fastsearch and 
Fingerprint formats 



Common Formats 

OBMoleculeFormat 
XMLBaseFormat 

obmolecformat.h 
xml.h 



Fingerprint 
API 



fingerprint.h 




Error Handling 

oberror.h 



Figure 2 Architecture of the Open Babel codebase. 



The code base can be considered as consisting of the 
following modules (Figure 2): 

♦ The Chemical Core, which contains OBMol etc. 
and has all of the chemical structure description and 
manipulation. This is the heart of the application 
and its API can be used as a chemical toolbox. It 
has no input/output capabilities. 

♦ The Formats, which read and write to files of dif- 
ferent types. These classes are derived from a com- 
mon base class, OBFormat, which is in the 
Conversion Control module. They also make use of 
the chemical routines in the Chemical Core module. 
Each format file contains a global object of the for- 
mat class. When the format is loaded the class con- 
structor registers the presence of the class with 
OBConversion. This means that the formats are plu- 
gins - new formats can be added without changing 
any framework code. 

♦ Common Formats include OBMoleculeFormat and 
XMLBaseFormat from which most other formats 
(like Format A and Format B in the diagram) are 
derived. Independent formats like Format C are also 
possible. 

♦ The Conversion Control, which also keeps track of 
the available formats, the conversion options and the 
input and output streams. It can be compiled with- 
out reference to any other parts of the program. In 
particular, it knows nothing of the Chemical Core: 
mol.h is not included. 

♦ The User Interface, which may be a command line 
application, a Graphical User Interface (GUI), or 
may be part of another program that uses Open 
Babel's input and output facilities. This depends only 
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on the Conversion Control module (obconversion.h 
is included), but not on the Chemical Core or on 
any of the Formats. 

♦ The Fingerprint API, as well as being usable in 
external programs, is employed by the fastsearch and 
fingerprint formats. 

♦ The Fingerprints, which are bit arrays that describe 
an object and which facilitate fast searching. They 
are also built as plugins, registering themselves with 
their base class OBFingerprint which is in the Fin- 
gerprint API. 

♦ Other features such as Forcefields, Partial Charge 
Models and Chemical Descriptors, although not 
shown in the diagram, are handled similarly to 
Fingerprints. 

♦ The Error Handling can be used throughout the 
program to log and display errors and warnings. 

Extensible Interface 

The utility of software libraries such as Open Babel 
depends on the ability of the design to be extended over 
time to support new functionality. To facilitate this, 
Open Babel implements a plugin interface for file for- 
mats, fingerprints, charge models, descriptors, "opera- 
tors" and molecular mechanics force fields. This ensures 
a clean separation of the implementation of a particular 
plugin from the core Open Babel library code, and 
makes it easy for a new plugin (e.g. a new file format) to 
be contributed; all that is needed is a single C++ file and 
a trivial change to one of the build files. The operator 
plugins provide a very general mechanism for operating 
on a molecule (e.g. energy minimization or 3D coordi- 
nate generation) or on a list of molecules (e.g. filtering 
or sorting) after reading but before writing. 

Plugins are dynamically loaded at runtime. This 
decreases the overall disk and memory footprint of 
Open Babel, allowing external developers to choose par- 
ticular functionality needed for their application and 
ignore other, less relevant features. It also allows the 
possibility of a third-party distributing plugins separately 
to the Open Babel distribution to provide additional 
functionality. 

Open-Source License and Open Development 

Open Babel is open-source software, which offers end 
users and third-party developers a range of additional 
rights not granted by proprietary chemistry software. 
Open-source software, at its most basic level, grants 
users the rights to study how their software works, to 
adapt it for any purpose or otherwise modify it, and to 
share the software and their modifications with others. 
In this sense, Open Source functions in similar ways to 
the processes of open peer review, publication, and 



citation in science. The rights granted by open source 
licenses largely coincide with the norms of scientific 
ethics to enable verifiability, repeatability, and building 
on previous results and theories. 

Beyond these rights, Open Babel (like most other 
open-source projects) offers open development - that is, 
all development occurs in public forums and with public 
code repositories. This results in greater input from the 
community as any user can easily submit bug reports or 
feature suggestions, get involved in discussions on the 
future direction of Open Babel or even become a devel- 
oper him/herself. In practice, the number of active con- 
tributors has increased over time through this level of 
open, public development (Figure 3). Moreover, it 
means that the development of the code is completely 
transparent and the quality of the software is available 
for public scrutiny. Indeed, since its inception, over 658 
bugs have been submitted to the public tracker and 
fixed [41]. 

Validation and Testing 

Open Babel includes an extensive test suite comprising 
60 different test programs each with tens to hundreds of 
tests. In early 2010, a nightly build infrastructure and 
dashboard was put in place with support from Kitware, 
Inc. This has greatly improved code quality by catching 
regressions, and also ensures that the code compiles 
cleanly on all platforms and compilers supported by 
Open Babel. Some examples of tests that are run each 
night are: 

(1) The MMFF94 forcefield code is tested against the 
MMFF94 validation suite. 




2002 2004 2006 2008 2010 2012 



Year 

Figure 3 Number of contributors over time. Note that this graph 
only includes developers who directly commited code to the Open 
Babel source code repository, and does not include patches 
provided by users. 

k ) 
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(2) The OBAlign class, which was developed using 
Test-Driven Development (TDD) methodology, is 
run against its test suite. 

(3) Handling of symmetry is validated by converting 
several test cases between SMILES, 2D and 3D SDF, 
and InChI (there are also several test programs with 
unit tests for the individual stereo classes in the 
API). 

(4) The SMARTS parser is tested using over 250 
valid and invalid SMARTS patterns, and the 
SMARTS matcher is tested using 125 basic 
SMARTS patterns. 

(5) The LSSR (Least Set of Smallest Rings) code is 
tested for invariance against changing the atom 
order for a series of polycyclic molecules. 

Recently the development team has placed a major 
focus on increasing the robustness of file format transla- 
tion particularly in relation to the commonly used 
SMILES and MDL Molfile formats. Translating between 
these formats requires accurate stereochemistry percep- 
tion, inference of implicit hydrogens, and kekulization of 
delocalized systems. While it is difficult to ensure that 
any complex piece of code is free of bugs, and Open 
Babel is no exception, validation procedures can be car- 
ried out to assess the current level of performance and 
to find additional test cases that expose bugs. The fol- 
lowing procedure was used to guide the rewriting of 
stereochemistry code in Open Babel, a project that 
began in early 2009. Starting with a dataset of 18,084 
3D structures from PubChem3D as an SDF file, we 
compared the result of (a) conversion to SMILES, fol- 
lowed by conversion of that to Canonical SMILES to (b) 
conversion directly to Canonical SMILES. This proce- 
dure can be used to flush out errors in reading the ori- 
ginal SDF file, reading/writing SMILES (either due to 
stereochemistry errors or kekulization problems), and is 
also a test (to some extent) of the canonicalization code. 
At the time of starting this work (March 2009), the 
error rate found was 1424 (8%); by Oct 2009, combined 
work on stereochemistry, kekulization and canonicaliza- 
tion had reduced this to 190 (-1%), and continued 
improvements have reduced the number of errors down 
to two (shown in Figure 4) for Open Babel 2.3.1 
(-0.01%). The first failure is due to a kekulization error 
in a polycyclic aromatic molecule incorporating heteroa- 
toms: (a) gave clccc2c(cl)cl [nH] [nH]c3c4clc(c2) 
ccc4cclc3ccccl while (b) gave clccc2c(cl)clnnc3c4clc 
(c2)ccc4cclc3ccccl. This error led to confusion over 
whether or not the aromatic nitrogens have hydrogens 
attached (they do not). The second failure involves con- 
fusion over the canonical stereochemistry at a bridge- 
head carbon: (a) gave C1CN2[C@@H](C1)CCC2 while 
(b) gave C1CN2[C@H](C1)CCC2. This is actually a 




PubChem CID 9107 PubChem CID 12558 

Figure 4 The two failures found in the validation test for 

reading/writing SMILES. 

k J 



meso compound and so both SMILES strings are cor- 
rect and represent the same molecule. However the 
canonicalization algorithm should have chosen one 
stereochemistry or the other for the canonical 
representation. 

Another area of focus was the canonicalization algo- 
rithm, which can be used to generate canonical SMILES 
as well as other formats. The algorithm can be tested by 
ensuring that the same canonical SMILES string is 
obtained even when the order of atoms in a molecule is 
changed (while retaining the same connection table). 
The test stresses all areas of the library, including aro- 
maticity perception, kekulization, stereochemistry, and 
canonicalization. The development of the canonicaliza- 
tion code in Open Babel was guided by applying this 
test to the 5,151,179 molecules in the eMolecules catalo- 
gue (dated 2011-01-02) with 10 random shuffles of the 
atom order. At the time of the Open Babel 2.2.3 release, 
there were 24,404 failures of the canonicalization algo- 
rithm; this has now been reduced to only four (shown 
in Figure 5, < 0.001%). The Open Babel nightly test 
suite ensures that this test passes for a number of pro- 
blematic molecules. Although the canonicalization algo- 
rithm is still not perfect, we believe that the current 
level of performance (99.99992% success on the eMole- 
cules catalogue) is acceptable for general use and with 
time we intend to improve performance further. 

Given that the error rate for canonicalization and 
handling of stereochemistry is now quite low, the next 
area of focus for the Open Babel development team is 
to improve the handling of implicit valence for "unusual 
atoms." This is particularly important for organometallic 
species and inorganic complexes. 

Using Open Babel 

Applications 

The Open Babel package is composed of a set of user 
applications as well as a programming library. The main 
command line application provided is obabel (a small 
upgrade on the earlier babel), which facilitates file for- 
mat conversion, filtering (by SMARTS, title, descriptor 
value, or property field), 3D or 2D structure generation, 
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conversion of hydrogens from implicit to explicit (and 
vice versa), and removal of small fragments or of dupli- 
cate structures. A number of features are provided to 
handle multi-molecule file formats (such as SDF or 
MOL2) and to use or manipulate the information in 
property fields and molecule titles. Here is an example 
of using obabel to convert from SDF format to SMILES: 

obabel inputmols . sdf -0 outputmols . smi 

A more complicated use would be to extract all mole- 
cules in an SDF file whose titles start with "active": 

obabel inputmols . sdf -aT -o copy -0 out- 
putmols. sdf -filter u title= ' active* ' " 

The copy format specified by "-o copy" is a utility for- 
mat that copies the exact contents of the input file (for 
the filtered molecules) directly to the output, without 
perception or interpretation. The "-aT" indicates that 
only the title of the input SDF file should be read; full 
chemical perception is not required. 

The Open Babel graphical user interface (GUI) pro- 
vides the same functionality. Figure 6 is a screenshot of 
the GUI carrying out the same filtering operation 
described in the obabel example above. The left panel 
deals with setting up the input file, the right panel han- 
dles the output and the central panel is for setting con- 
version options. Depending on whether a particular 
option requires a parameter, the available options are 
displayed either as check boxes or as text entry boxes. 
These interface elements are generated dynamically 
directly from the text description and help text provided 
by each format plugin. 

Programming Library 

The Open Babel library allows users to write chemistry 
applications without worrying about the low-level details 
of handling chemical information, such as how to read 
or write a particular file format, or how to use SMARTS 
for substructure searching. Instead, the user can focus 
on the scientific problem at hand, or on creating a more 
easy-to-use interface (e.g. a GUI) to some of Open 
Babel's functionality. The Open Babel API (Application 
Programming Interface) is the set of classes, methods 
and variables provided by Open Babel to the user for 



use in programs. Documentation on the complete API 
(generated using Doxygen [42]) is available from the 
Open Babel website [43], or can be generated from the 
source code. 

The functionality provided by the Open Babel library 
is relied upon by many users and by several other soft- 
ware projects, with the result that introducing changes 
to the API would cause existing software to break. For 
this reason, Open Babel strives to maintain API stabi- 
lity over long periods of time, so that existing software 
will continue to work despite the release of new Open 
Babel versions with additional features, file formats 
and bug fixes. Open Babel uses a version numbering 
system that indicates how the API has changed with 
every release: 

♦ Bug fix releases (e.g. 2.0.0 versus 2.0.1) do not 
change API at all 

♦ Minor version releases (e.g. 2.0 versus 2.1) will add 
to the API, but will otherwise be backwards- 
compatible 

♦ Major version releases (e.g. 2 versus 3) are not 
backwards-compatible, and have changes to the API 
(including removal of deprecated classes and 
functions) 

Figure 7 shows an example C++ program that uses the 
two main classes OBConversion and OBMol to print 
out the molecular weight of all of the molecules in an 
SDF file. This could be used, for example, to investigate 
differences in the molecular weight distribution between 
two databases. The same program is shown in Figure 8 
but implemented using the Python bindings. 

Examples of Use 

Open Babel has already been referenced over 400 times 
for various uses. The most common use of Open Babel 
is through the obabel command line application (or the 
corresponding graphical user interface) for the intercon- 
version of chemical file formats. Such conversions may 
also involve the calculation or inference of additional 
molecular information or application of a filter. Some 
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Tue May 17, 12:08 PM ©. noel (!) 



- — INPUT FORMAT — 

sdf -- MDLMOL format 



□ Use this format for all input files (ignore file extensions) 
/home/noel/ 



I inputmols.sdf 



□ Input below (ignore input file) 



inactive_l 

OpenBabel0517llll533D 

760000000 0999V2000 

0.9955 0.0472 -0.0991 C 000000000000 

2.4986 -0.0372 -0.0685 C 000000000000 

3.0679 -1.0863 0.2219 O 000000000000 

3.2748 1.4619 -0.4798 CI 000000000000 

0.6456 1.0513 -0.3521 H 000000000000 

0.6045 -0.2169 0.8869 H 000000000000 

0.6154 -0.6523 -0.8482 H 000000000000 

1 2 1 0 0 0 0 

1 5 1 0 0 0 0 

1 6 1 0 0 0 0 

1 7 1 0 0 0 0 

2 4 1 0 0 0 0 
2 3 2 0 0 0 0 

M END 

$$$$ 

activel 

OpenBabel05171111533D 



10 9 0 0 
1.0398 
2.5508 
2.9672 
3.0863 
3.1951 
0.7415 
0.6124 
0.5971 
2.9781 
2.9728 

12 10 

16 10 

17 10 

18 10 



0 0 0 0 0 0999V2000 

0.0726 0.1285 C 000000000000 
0.0058 C 000000000000 
-1.4269 C 000000000000 
-1.8668 0 000000000000 
■2.3311 CI 000000000000 
1.1707 H 000000000000 
-0.2151 H 000000000000 
-0.4745 H 000000000000 
0.6276 H 000000000000 
0.3644 H 000000000000 



0.0655 
-0.1864 
-1.3273 

1.2819 

0.2248 
-0.8761 

0.8731 
-0.7279 

1.0095 
0 0 0 
0 0 0 
0 0 0 
0 0 0 



.... OUTPUT FORMAT — - 

i copy - Copy raw text 



J Format Info 



Filter: convert only when tests are true: 



ltitle='active*' 
] I - 1 1 1 Add properties from descriptors 

Delete properties in list 
Append properties or descriptors in list to title: 



Output file 



/home/noel/outputmols.sdf 



□ Output below only (no output file) i i Display in firefox 



□ Join all input molecules into a single output molecule 

□ Output disconnected fragments separately 

II 1 add or replace a property (SDF) 



Add or replace molecule title 
ZD Append text to title 

□ Output multiple conformers separately 

□ Append output index to title 

Additional file output 

□ Append input index to title 

□ Adds hydrogen to polar atoms only 

□ Canonicalize the atom order 
Fill the unit cell (strict or keepconnect) 



r 

□ Generate 2D coordinates 

□ Generate 3D coordinates 

Calculate partial charges by specified method 

□ Adjacent conformers combined into a single molecule 
I Sort by descriptor(~desc for reverse) 

□ I inchi I remove duplicates by descriptor 

□ determine chirality from atom parity flags 
O read title only 

□ read title and properties only 



2 molecules converted 


active 1 














OpenBabel05171111533D 












10 9 0 0 0 0 0 


0 0 0999V2000 










1.0398 0.0726 


0.1285 C 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


2.5508 0.0655 


0.0058 C 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


2.9672 -0.1864 


-1.4269 C 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


3.0863 -1.3273 


-1.8668 0 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


3.1951 1.2819 


-2.3311 CI 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


0.7415 0.2248 


1.1707 H 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


0.6124 -0.8761 


-0.2151 H 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


0.5971 0.8731 


-0.4745 H 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


2.9781 -0.7279 


0.6276 H 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


2.9728 1.0095 


0.3644 H 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


1 2 1 0 0 0 0 














1 6 1 0 0 0 0 














1 7 1 0 0 0 0 














1 8 1 0 0 0 0 














2 3 1 0 0 0 0 














2 9 1 0 0 0 0 














2 10 1 0 0 0 0 














3 5 1 0 0 0 0 














3 4 2 0 0 0 0 














M END 














$$$$ 














active 2 














OpenBabel05171111533D 












13 12 0 0 0 0 0 


0 0 0999V2000 










0.9502 0.3023 


0.2352 C 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


2.4381 0.0424 


0.0657 C 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


2.7556 -0.5462 


-1.3069 C 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


4.2412 -0.7973 


-1.4983 C 


0 0 0 


0 


0 0 


0 


0 0 0 0 0 


5.0799 -0.5354 


-0.6418 0 


0 0 0 


0 


0 0 


0 


ooooo w 


B ~ 



Figure 6 Screenshot of the Open Babel GUI. In the screenshot, the Open Babel GUI is running on Bio-Linux 6.0, an Ubuntu derivative. 



published examples of these include the following: 



♦ interconversion of chemical file formats or repre- 
sentations [44-47] 

♦ addition of hydrogens [48-50] 

♦ generation of 3D molecular structures [51-53] 



^include <iostream> 

^include <openbabel/obconversion.h> 
#i ncl ude <openbabel /mol . h> 

int main(int argc,char **argv) 
{ 

OBConversion conv; 
conv.SetInFormat("sdf ") ; 
OBMol mol ; 

bool notatend = conv.ReadFile(&mol , "dataset . sdf ") ; 

while (notatend) 

{ 

std::cout « "Molecular Weight: " 

« mol .GetMolWtO « std::endl; 

mol . Clear() ; 

notatend = conv. Read (&mol) ; 

} 

return 0; 

} 

Figure 7 Example C++ program that uses the Open Babel 
library. The program prints out the molecular weight of each 
molecule in the SDF file "dataset.sdf". 



♦ calculation of partial charges [54,55] 

♦ generation of molecular fingerprints [56-59] 

♦ removal of duplicate molecules from a dataset [60] 

♦ calculation of MOL2 atom types [61] 

An interesting example that shows how a particular 
chemical representation may be used to facilitate a 
scientific study is the crystallographic study of Fabian 
and Brock who used Open Babel to generate InChI 
strings for molecules in the Cambridge Structural Data- 
base [62]. Exploiting the fact that InChls of enantiomers 
are identical expect at the enantiomer sublayer (7m0" 



import openbabel as ob 

conv = ob .OBConversionO 
conv. SetlnFormat ("sdf ") 
mol = ob.OBMolO 

notatend = conv. ReadFile (mol , "dataset.sdf") 
while notatend: 

print "Molecular Weight: %f" % mol . GetMol Wt () 

mol . Clear () 

notatend = conv. Read (mol ) 

Figure 8 Example Python program that uses the Open Babel 
library. The program prints out the molecular weight of each 
molecule in the SDF file "dataset.sdf". 
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Table 1 Software applications and libraries that use Open Babel 



Name 



Description 



Reference Web page 



Avogadro 

cclib 
CCP1GUI 

ChemAzTech 
ChemSpotlight 
ChemT 
ChemTool 

CMDF 

Confab 
DockoMatic 
DOVIS 2.0 
FAF-Drugs2 

FMiner2 

Ghemical 

Gnome 

Chemistry Utils 
iBabel 

Kalzium 

Lazar 

Molekel 

molsKetch 

MyChem 

NanoEngineer- 
1 

NanoHive-1 

OpenMD 
Open3DQSAR 

OSRA 
PgChem 

Pharao 
Pharmer 
Piramid 
PyADF 

PyRx 

QMForge 

RMG 

Sci3D 

Sieve 

SMIREP 

Stripper 



GUI for molecular modelling and computational chemistry G. Hutchison 

M. Hanwell 

Parse computational chemistry output files [72] 

GUI for computational chemistry Jens Thomas 

Manage a chemical laboratory database Remy Dernat 

Chemistry file indexer for MacOSX G. Hutchison 

GUI for generating combinatorial libraries Rui Abreu 

2D molecular drawing [73] 

Library for handling and preparing multi-scale multi-paradigm [74] 
simulations 

Systematically generate conformers [36] 

Automate the preparation and analysis of AutoDock runs [75] 

Automate the preparation and analysis of AutoDock runs [76] 

ADMET filtering of molecular datasets [77] 

Large-scale chemical graph mining based on backbone [78,79] 
refinement classes 

GUI for computational chemistry Tommi 

Hassinen 

2D chemical editor, 3D viewer, chemical calculator and periodic Jean Brefort 
table for Linux 

MacOSX interface to Open Babel and other Open chemistry tools Chris Swain 
GUI showing information on the periodic table of the elements 



Lazy Structure-Activity Relationships for toxicity prediction 
GUI for computational chemistry 
2D chemical editor 

Chemistry extension to the MySQL database 
Computer-aided design for the nanoscale 

Simulator for the study, experimentation, and development of 
nanotech entities 

Open Source molecular dynamics engine 
High-throughput 

chemometric analysis of molecular interaction fields 
Extracts chemical structures from images 
Chemistry extension to the PostgreSQL database 

Pharmacophore discovery and searching 
Pharmacophore searching 
Shape-based alignment of molecules 

Library for handling and preparing quantum mechanical multi- 
scale simulations 

GUI for virtual screening with protein-ligand docking 

GUI for analysing results of quantum chemistry calculations 
Reaction Mechanism Generator 

Interactive visualization of 3D models of scientific data, such as 
molecular structures and surfaces 

Filter molecules from datasets 

Generation of fragment-based structure-activity relationships 
Extract molecular scaffolds 



Carsten 
Niehaus 

[80] 

Ugo Varetto 

Harm van 
Eersel 

J. Pansanel 

Nanorex, Inc. 



http://avogadro.openmolecules.net/ 
http://cclib.sf.net/ 

http://www.cse.scitech.ac.uk/ccg/software/ 
ccplgui 

http://chemaztech.sf.net/ 

http://chemspotlight.openmolecules.net/ 

http://www.esa.ipb.pt/~ruiabreu/chemt 

http://ruby.chemie.uni-freiburg.de/~martin/ 
chemtool 

http://web.mit.edu/mbuehler/www/research/ 
CMDF/CMDF.htm 

http://confab.googlecode.com/ 

http://sf.net/projects/dockomatic/ 

http://www.bhsai.org/dovis.html 

http://www.mti .u n i v-pa ri s-d i d erot.fr/fr/ 
downloads.html 

http://www.maunz.de/wordpress/bbrc 

http://www.uku.fi/~thassine/projects/ 
ghemical 

http://gchemutils.nongnu.org/ 

http://homepage.mac.com/swain/Sites/ 
Macinchem/page65/ibabel3.html 

http://ed u .kde.org/kalzi u m/ 

http://www.in-silico.de/software/ 

http://molekel.cscs.ch/ 

http://molsketch.sf.net/ 

http://mychem.sf.net/ 
http://nanoengineer-1 .net/ 



Brian Helfrich http://www.nanohive-1.org/ 

[81] http://openmd.net/ 

[82,83] http://www.open3dqsar.org/ 

[84] http://osra.sf.net/ 

Ernst-Georg http://pgfoundry.org/projects/pgchem 
Schmidt 

Silicos NV http://www.silicos.be/ 

[85] http://smoothdock.ccbb.pitt.edu/pharmer 

Silicos NV http://www.silicos.be/ 

[86] http://www.ipc.kit.ed u/cfn-ysg/1 58.php 

Sargis http://pyrx.scripps.edu/ 
Dallakyan 

[72] http://qmforge.sf.net/ 

[87] http://rmg.sf.net/ 

T.J. O'Donnell http://sci3d.sf.net/ 

Silicos NV http://www.silicos.be/ 

[88] http://www.karwath.org/systems/smirep.html 

Silicos NV http://www.silicos.be/ 
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Table 1 Software applications and libraries that use Open Babel (Continued) 



Toxtree Toxic hazard estimation using decision trees Ideaconsult 

Ltd. 

V_Sim Visualize atomic structures such as crystals and grain boundaries Damien Caliste 

WebBabel Web application for file format conversion T.J. O'Donnell 

XDrawChem 2D molecular editor Bryan Herger 

XtalOpt Extension to Avogadro for crystal-structure prediction [89] 

YASARA GUI for molecular graphics, modeling and simulation Elmar Krieger 

ZODIAC GUI for molecular modelling and docking [90] 



http://toxtree.sf.net/ 

http://inac.cea.fr/L_Sim/V_Sim/index.en.html 

http://webbabel.sf.net/ 

http://xdrawchem.sf.net/ 

http://xtalopt.openmolecules.net/ 

http://www.yasara.org/ 

http://www.zeden.org/ 



or "/ml"), they used the InChls as part of a workflow to 
identify kryptoracemates (a class of racemic crystals 
where the enantiomers are not related by space-group 
symmetry) in the database. 

To implement new methods, or access additional mole- 
cular information, it is necessary to use the Open Babel 
library directly either from C++ or using one of the sup- 
ported language bindings. Some examples of published 
studies that have done this include the following: 

♦ Dehmer et al. implemented molecular complexity 
measures based on information theory [63]. 



♦ Langham and Jain developed a model for chemical 
mutagenicity based on atom pair features [64]. 

♦ Fontaine et al. implemented a method, anchor- 
GRIND, that uses an anchor point of a molecular 
scaffold to compare molecular interaction fields 
when different substituents are present [65]. 

♦ Konyk et al. have developed a plugin for Open 
Babel that adds support for the Web Ontology Lan- 
guage (OWL) to allow automated reasoning about 
chemical structures [66]. 

♦ Kogej et al. (AstraZeneca) implemented a 3-point 
pharmacophore fingerprint called TRUST [67]. 



Table 2 Web applications and databases that use Open Babel 


Name 


Description 


Reference 


Web page 


ChemDB 


Database of small molecules 


[91] 


http://cdb.ics.uci.edu/ 


Chemeo 


Chemical structure and property search engine 


Ceondo Ltd 


http://www.chemeo.com/ 


ChemMine 
Tools 


Web application for analysing and clustering small molecules 


[92] 


http://chemmine.ucr.edu/ 


eMolecules 


Chemical vendor search engine 


eMolecules. 
com 


http://emolecules.com/ 


FragmentStore 


Database for comparison of fragments found in metabolites, drugs 
and toxic compounds 


[93] 


http://bioinf-applied.charite.de/ 
fragment_store/ 


Frog2 


FRee Online druG 3D conformation generation 


[94] 


http://bioserv.rpbs.univ-paris-diderot.fr/cgi- 
bin/Frog2 


hBar Lab 


Web application providing on-demand access to computer-aided 
chemistry 


hBar Solutions 
ApS 


https://www.hbar-lab.com/ 


IUPHAR-DB 


Database of human drug targets and their ligands 


[95] 


http://www.iuphar-db.org/ 


OpenCDLig 


Web application for sharing resources about cyclodextrin/ligand 
complexes 


[96] 


https://kdd.di.unito.it/casmedchem/ 


PSMDB 


Protein - Small-Molecule Database 


[97] 


http://compbio.cs.toronto.edu/psmdb/ 


SambVca 


Web application for calculation of buried volume of organometallic 
ligands 


[98] 


https://www.molnac.unisa.it/OMtools/ 
sambvca.php 


ScafBank 


Database of molecular scaffolds 


[99] 


http://202.1 27.30.1 84:8080/scafbank.html 


SMARTCyp 


Web application for prediction of sites of cytochrome P450 
mediated metabolism 


[100] 


http://www.farma.ku.dk/smartcyp/ 


sMol Explorer 


Web application for exploring small-molecule datasets 


[101] 


http://www3a.biotec.or.th/isl/index.php/smol- 
explorer 


Superimpose 


Web application for structural similarity between ligands, binding 
sites or proteins 


[102] 


http://farnsworth.charite.de/superimpose- 
web/ 


SuperToxic 


Database of toxic compounds 


[103] 


http://bioinformatics.charite.de/supertoxic/ 


SuperSite 


Detailed information on, and comparisons of, protein-ligand 
binding sites 


[104] 


http://bioinf-tomcat.charite.de/supersite/ 


SuperSweet 


Database of natural and artificial sweeteners 


[105] 


http://bioinf-applied.charite.de/sweet/ 


STITCH2 


Chemical-protein interactions 


[106] 


http://stitch.embl.de/ 


VCCLAB 


Virtual Computational Chemistry Laboratory 


[107] 


http://www.vcclab.org/ 


wwLigCSRre 


Web application that performs ligand-based screening using 3D 
similarity 


[108] 


http://bioserv.rpbs.univ-paris-diderot.fr/Help/ 
wwLigCSRre.html 
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♦ Many other examples exist [68-71]. 

The vital role that a cheminformatics toolkit plays in 
the development of scientific resources is shown by 
Tables 1 and 2. Table 1 lists examples of stand-alone 
applications or programming libraries that rely on Open 
Babel, either calling the library directly or via one of the 
command-line executables. Table 2 contains examples 
of web applications and databases that either use Open 
Babel on the server or where Open Babel was used in 
the preparation of the data. 

Conclusions 

In November 2011, Open Babel will mark 10 years of 
existence as an independent project, and for the first 
time, we have discussed its development and features. 
As shown by more than 400 citations, it has become an 
essential tool for handling the myriad of molecular file 
formats encountered in diverse branches of chemistry. 
While more work remains to be done, through valida- 
tion processes such as those described above and the 
recent introduction of a nightly build and testing frame- 
work, we aim to improve the quality and robustness of 
the toolkit with each new release. 

Looking forward to the future, one of the goals of the 
project is to extend support to molecules that currently 
are not handled very well by existing cheminformatics 
toolkits. Typically toolkits focus on the types of mole- 
cules of principal importance to the pharmaceutical 
industry, namely stable organic molecules comprising 
wholly of 2-center 2-electron covalent bonds. Molecules 
outside this set - such as radicals, organometallic and 
inorganic molecules, molecules with coordinate bonds 
or 3-center 2-electron bonds - are poorly supported in 
general. Future releases of Open Babel will provide sub- 
stantially improved handling of such species. We also 
seek to improve speed and coverage of important meth- 
ods such as structure generation, kekulization and 
canonicalization. 

Open Babel is freely available from http://openbabel. 
org, and new community members are very welcome 
(users, developers, bug reporters, feature requesters). For 
information on how to use Open Babel, please see the 
documentation at http://openbabel.org/docs and the API 
documentation at http://openbabel.org/api. 

Availability and Requirements 

Project Name: Open Babel 

Project home page: http://openbabel.org 

Operating system(s): Cross-platform 

Programming language: C++, bindings to Python, 
Perl, Ruby, Java, C# 

Other requirements (if compiling): CMake 2.4+ 

License: GNU GPL v2 



Any restrictions to use by non-academics: None 
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